This study focuses on embodied agents that can follow natural language instructions to complete complex tasks in a visually-perceived environment. Existing methods rely on a large amount of (instruction, gold trajectory) pairs to learn a good policy. The high data cost and poor sample efficiency prevents the development of versatile agents that are capable of many tasks and can learn new tasks quickly. In this work, we propose a novel method, LLM-Planner, that harnesses the power of large language models (LLMs) such as GPT-3 to do few-shot planning for embodied agents. We further propose a simple but effective way to enhance LLMs with physical grounding to generate plans that are grounded in the current environment. Experiments on the ALFRED dataset show that our method can achieve very competitive few-shot performance, even outperforming several recent baselines that are trained using the full training data despite using less than 0.5% of paired training data. Existing methods can barely complete any task successfully under the same few-shot setting. Our work opens the door for developing versatile and sample-efficient embodied agents that can quickly learn many tasks.
translated by 谷歌翻译
我们研究了开发自主代理的问题,这些自主代理可以遵循人类的指示来推断和执行一系列行动以完成基础任务。近年来取得了重大进展,尤其是对于短范围的任务。但是,当涉及具有扩展动作序列的长匹马任务时,代理可以轻松忽略某些指令或陷入长长指令中间,并最终使任务失败。为了应对这一挑战,我们提出了一个基于模型的里程碑的任务跟踪器(M-Track),以指导代理商并监视其进度。具体而言,我们提出了一个里程碑构建器,该建筑商通过导航和交互里程碑标记指令,代理商需要逐步完成,以及一个系统地检查代理商当前里程碑的进度并确定何时继续进行下一个的里程碑检查器。在具有挑战性的Alfred数据集上,我们的M轨道在两个竞争基本模型中,未见成功率的相对成功率显着提高了33%和52%。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
目的:加速径向采样的扩散加权自旋回波(RAD-DW-SE)采集方法,以生成高质量的表观扩散系数(ADC)地图。方法:开发了一种深度学习方法,以从用RAD-DW-SE方法获取的未采样的DWI数据生成准确的ADC映射重建。深度学习方法将卷积神经网络(CNN)与Vison变形金刚集成在一起,以生成从无效的DWI数据中生成高质量的ADC图,该数据由单指数ADC模型拟合项正常化。对147只小鼠的DWI数据进行了培训,并对36只小鼠的DWI数据进行了评估,其采样率为4倍和8倍。结果:消融研究和实验结果表明,所提出的深度学习模型可以从不足采样的DWI数据中生成高质量的ADC图,比在比较的替代深度学习方法中,其性能在不同级别的图像,肿瘤,肾脏和牙齿上进行了量化。肌肉。结论:具有集成CNN和变形金刚的深度学习方法提供了一种有效的手段,可以从使用RAD-DW-SE方法中获取的不足采样的DWI数据中准确计算ADC映射。
translated by 谷歌翻译
大多数现实世界情景的环境,如商场和超市始终变化。预构建的地图,不会占这些变化的内容容易过时。因此,有必要具有环境的最新模型,以促进机器人的长期运行。为此,本文呈现了一般终身同时定位和映射(SLAM)框架。我们的框架使用多个会话映射表示,并利用一个有效的地图更新策略,包括地图建筑,姿势图形细化和稀疏化。为了减轻内存使用情况的无限性增加,我们提出了一种基于Chow-Liu最大相互信息生成树的地图修剪方法。在真正的超市环境中,通过一个月的机器人部署全面验证了拟议的SLAM框架。此外,我们释放了从室内和户外变化环境中收集的数据集,希望加速在社区中的终身猛烈的Slam研究。我们的数据集可在https://github.com/sanduan168/lifelong-slam-dataset中获得。
translated by 谷歌翻译
声学和视觉感测可以在人操纵时支持容器重量和其内容量的非接触式估计。但是,Opaquent和透明度(包括容器和内容的透明度)以及材料,形状和尺寸的可变性都会使这个问题具有挑战性。在本文中,我们向基准方法提出了一个开放框架,用于估计容器的容量,以及其内容的类型,质量和量。该框架包括数据集,明确定义的任务和性能测量,基线和最先进的方法,以及对这些方法的深入比较分析。使用单独的音频或音频和视觉数据的组合使用具有音频的神经网络的深度学习,用于分类内容的类型和数量,无论是独立的还是共同。具有视觉数据的回归和几何方法是优选的,以确定容器的容量。结果表明,使用仅使用Audio作为输入模块的方法对内容类型和级别进行分类,可分别获得加权平均F1-得分高达81%和97%。估计仅具有视觉视觉的近似接近和填充质量的容器容量,具有视听,多级算法达到65%的加权平均容量和质量分数。
translated by 谷歌翻译
本文回顾了关于压缩视频质量增强质量的第一个NTIRE挑战,重点是拟议的方法和结果。在此挑战中,采用了新的大型不同视频(LDV)数据集。挑战有三个曲目。Track 1和2的目标是增强HEVC在固定QP上压缩的视频,而Track 3旨在增强X265压缩的视频,以固定的位速率压缩。此外,轨道1和3的质量提高了提高保真度(PSNR)的目标,以及提高感知质量的2个目标。这三个曲目完全吸引了482个注册。在测试阶段,分别提交了12个团队,8支球队和11支球队,分别提交了轨道1、2和3的最终结果。拟议的方法和解决方案衡量视频质量增强的最先进。挑战的首页:https://github.com/renyang-home/ntire21_venh
translated by 谷歌翻译
Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.
translated by 谷歌翻译
Brain midline shift (MLS) is one of the most critical factors to be considered for clinical diagnosis and treatment decision-making for intracranial hemorrhage. Existing computational methods on MLS quantification not only require intensive labeling in millimeter-level measurement but also suffer from poor performance due to their dependence on specific landmarks or simplified anatomical assumptions. In this paper, we propose a novel semi-supervised framework to accurately measure the scale of MLS from head CT scans. We formulate the MLS measurement task as a deformation estimation problem and solve it using a few MLS slices with sparse labels. Meanwhile, with the help of diffusion models, we are able to use a great number of unlabeled MLS data and 2793 non-MLS cases for representation learning and regularization. The extracted representation reflects how the image is different from a non-MLS image and regularization serves an important role in the sparse-to-dense refinement of the deformation field. Our experiment on a real clinical brain hemorrhage dataset has achieved state-of-the-art performance and can generate interpretable deformation fields.
translated by 谷歌翻译
Adversarial imitation learning (AIL) has become a popular alternative to supervised imitation learning that reduces the distribution shift suffered by the latter. However, AIL requires effective exploration during an online reinforcement learning phase. In this work, we show that the standard, naive approach to exploration can manifest as a suboptimal local maximum if a policy learned with AIL sufficiently matches the expert distribution without fully learning the desired task. This can be particularly catastrophic for manipulation tasks, where the difference between an expert and a non-expert state-action pair is often subtle. We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of multiple exploratory, auxiliary tasks in addition to a main task. The addition of these auxiliary tasks forces the agent to explore states and actions that standard AIL may learn to ignore. Additionally, this particular formulation allows for the reusability of expert data between main tasks. Our experimental results in a challenging multitask robotic manipulation domain indicate that LfGP significantly outperforms both AIL and behaviour cloning, while also being more expert sample efficient than these baselines. To explain this performance gap, we provide further analysis of a toy problem that highlights the coupling between a local maximum and poor exploration, and also visualize the differences between the learned models from AIL and LfGP.
translated by 谷歌翻译